CLEANing the Reward: Counterfactual Actions Remove Exploratory Action Noise in Multiagent Learning

نویسندگان

Chris HolmesParker

Matthew E. Taylor

چکیده

Coordinating the joint-actions of agents in cooperative multiagent systems is a difficult problem in many real world domains. Learning in such multiagent systems can be slow because an agent may not only need to learn how to behave in a complex environment, but also to account for the actions of other learning agents. The inability of an agent to distinguish between the true environmental dynamics and those caused by the stochastic exploratory actions of other agents creates noise in each agent’s reward signal. This learning noise can have unforeseen and often undesirable effects on the resultant system performance. We define such noise as exploratory action noise, demonstrate the critical impact it can have on the learning process in multiagent settings, and introduce a reward structure to effectively remove such noise from each agent’s reward signal. In particular, we introduce two types of Coordinated Learning without Exploratory Action Noise (CLEAN) rewards that allow an agent to estimate the counterfactual reward it would have received had it taken an alternative action. We empirically show that CLEAN rewards outperform agents using both traditional global rewards and shaped difference rewards in two domains.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

CLEANing the Reward: Counterfactual Actions to Remove Exploratory Action Noise in Multiagent Learning

Learning in multiagent systems can be slow because agents must learn both how to behave in a complex environment and how to account for the actions of other agents. The inability of an agent to distinguish between the true environmental dynamics and those caused by the stochastic exploratory actions of other agents creates noise in each agent’s reward signal. This learning noise can have unfore...

متن کامل

CLEANing the reward: counterfactual actions to remove exploratory action noise in multiagent learning (extended abstract)

متن کامل

CLEAN rewards for improving multiagent coordination in the presence of exploration

In cooperative multiagent systems, coordinating the jointactions of agents is difficult. One of the fundamental difficulties in such multiagent systems is the slow learning process where an agent may not only need to learn how to behave in a complex environment, but may also need to account for the actions of the other learning agents. Here, the inability of agents to distinguish the true envir...

متن کامل

Counterfactual Exploration for Improving Multiagent Learning

In any single agent system, exploration is a critical component of learning. It ensures that all possible actions receive some degree of attention, allowing an agent to converge to good policies. The same concept has been adopted by multiagent learning systems. However, there is a fundamentally different dynamic in multiagent learning: each agent operates in a non-stationary environment, as a d...

متن کامل

Learning from Actions not Taken in Multiagent Systems

In large cooperative multiagent systems, coordinating the actions of the agents is critical to the overall system achieving its intended goal. Even when the agents aim to cooperate, ensuring that the agent actions lead to good system level behavior becomes increasingly difficult as systems become larger. One of the fundamental difficulties in such multiagent systems is the slow learning process...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2014

CLEANing the Reward: Counterfactual Actions Remove Exploratory Action Noise in Multiagent Learning

نویسندگان

چکیده

منابع مشابه

CLEANing the Reward: Counterfactual Actions to Remove Exploratory Action Noise in Multiagent Learning

CLEANing the reward: counterfactual actions to remove exploratory action noise in multiagent learning (extended abstract)

CLEAN rewards for improving multiagent coordination in the presence of exploration

Counterfactual Exploration for Improving Multiagent Learning

Learning from Actions not Taken in Multiagent Systems

عنوان ژورنال:

اشتراک گذاری